智能论文笔记

A Lightweight Transmission Parameter Selection Scheme Using Reinforcement Learning for LoRaWAN

Aohan Li , Ikumi Urabe , Minoru Fujisawa , So Hasegawa , Hiroyuki Yasuda , Song-Ju Kim , Mikio Hasegawa

分类：机器学习

2022-08-03

预计到2023年，物联网设备的数量将达到1,250亿。物联网设备的增长将加剧设备之间的碰撞，从而降低通信性能。选择适当的传输参数，例如通道和扩展因子（SF），可以有效地减少远程（LORA）设备之间的碰撞。但是，当前文献中提出的大多数方案在具有有限的计算复杂性和内存的物联网设备上都不容易实现。为了解决此问题，我们提出了一种轻巧的传输参数选择方案，即使用用于低功率大区域网络（Lorawan）的增强学习的联合通道和SF选择方案。在拟议的方案中，可以仅使用确认（ACK）信息来选择适当的传输参数。此外，我们从理论上分析了我们提出的方案的计算复杂性和记忆要求，该方案验证了我们所提出的方案可以选择具有极低计算复杂性和内存要求的传输参数。此外，在现实世界中的洛拉设备上实施了大量实验，以评估我们提出的计划的有效性。实验结果证明了以下主要现象。（1）与其他轻型传输参数选择方案相比，我们在Lorawan中提出的方案可以有效避免Lora设备之间的碰撞，而与可用通道的变化无关。（2）可以通过选择访问通道和使用SFS而不是仅选择访问渠道来提高帧成功率（FSR）。（3）由于相邻通道之间存在干扰，因此可以通过增加相邻可用通道的间隔来改善FSR和公平性。

translated by 谷歌翻译

Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder

Yusuke Yasuda , Tomoki Toda

分类：自然语言处理 | (统计)机器学习

2022-12-16

Text-to-speech synthesis (TTS) is a task to convert texts into speech. Two of the factors that have been driving TTS are the advancements of probabilistic models and latent representation learning. We propose a TTS method based on latent variable conversion using a diffusion probabilistic model and the variational autoencoder (VAE). In our TTS method, we use a waveform model based on VAE, a diffusion model that predicts the distribution of latent variables in the waveform model from texts, and an alignment model that learns alignments between the text and speech latent sequences. Our method integrates diffusion with VAE by modeling both mean and variance parameters with diffusion, where the target distribution is determined by approximation from VAE. This latent variable conversion framework potentially enables us to flexibly incorporate various latent feature extractors. Our experiments show that our method is robust to linguistic labels with poor orthography and alignment errors.

translated by 谷歌翻译

Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

Yusuke Yasuda , Tomoki Toda

分类：自然语言处理

2022-12-16

End-to-end text-to-speech synthesis (TTS) can generate highly natural synthetic speech from raw text. However, rendering the correct pitch accents is still a challenging problem for end-to-end TTS. To tackle the challenge of rendering correct pitch accent in Japanese end-to-end TTS, we adopt PnG~BERT, a self-supervised pretrained model in the character and phoneme domain for TTS. We investigate the effects of features captured by PnG~BERT on Japanese TTS by modifying the fine-tuning condition to determine the conditions helpful inferring pitch accents. We manipulate content of PnG~BERT features from being text-oriented to speech-oriented by changing the number of fine-tuned layers during TTS. In addition, we teach PnG~BERT pitch accent information by fine-tuning with tone prediction as an additional downstream task. Our experimental results show that the features of PnG~BERT captured by pretraining contain information helpful inferring pitch accent, and PnG~BERT outperforms baseline Tacotron on accent correctness in a listening test.

translated by 谷歌翻译

Achieving Transparency in Distributed Machine Learning with Explainable Data Collaboration

Anna Bogdanova , Akira Imakura , Tetsuya Sakurai , Tomoya Fujii , Teppei Sakamoto , Hiroyuki Abe

分类：机器学习 | 人工智能

2022-12-06

Transparency of Machine Learning models used for decision support in various industries becomes essential for ensuring their ethical use. To that end, feature attribution methods such as SHAP (SHapley Additive exPlanations) are widely used to explain the predictions of black-box machine learning models to customers and developers. However, a parallel trend has been to train machine learning models in collaboration with other data holders without accessing their data. Such models, trained over horizontally or vertically partitioned data, present a challenge for explainable AI because the explaining party may have a biased view of background data or a partial view of the feature space. As a result, explanations obtained from different participants of distributed machine learning might not be consistent with one another, undermining trust in the product. This paper presents an Explainable Data Collaboration Framework based on a model-agnostic additive feature attribution algorithm (KernelSHAP) and Data Collaboration method of privacy-preserving distributed machine learning. In particular, we present three algorithms for different scenarios of explainability in Data Collaboration and verify their consistency with experiments on open-access datasets. Our results demonstrated a significant (by at least a factor of 1.75) decrease in feature attribution discrepancies among the users of distributed machine learning.

translated by 谷歌翻译

Hybrid Life: Integrating Biological, Artificial, and Cognitive Systems

Manuel Baltieri , Hiroyuki Iizuka , Olaf Witkowski , Lana Sinapayen , Keisuke Suzuki

分类：人工智能

2022-12-01

Artificial life is a research field studying what processes and properties define life, based on a multidisciplinary approach spanning the physical, natural and computational sciences. Artificial life aims to foster a comprehensive study of life beyond "life as we know it" and towards "life as it could be", with theoretical, synthetic and empirical models of the fundamental properties of living systems. While still a relatively young field, artificial life has flourished as an environment for researchers with different backgrounds, welcoming ideas and contributions from a wide range of subjects. Hybrid Life is an attempt to bring attention to some of the most recent developments within the artificial life community, rooted in more traditional artificial life studies but looking at new challenges emerging from interactions with other fields. In particular, Hybrid Life focuses on three complementary themes: 1) theories of systems and agents, 2) hybrid augmentation, with augmented architectures combining living and artificial systems, and 3) hybrid interactions among artificial and biological systems. After discussing some of the major sources of inspiration for these themes, we will focus on an overview of the works that appeared in Hybrid Life special sessions, hosted by the annual Artificial Life Conference between 2018 and 2022.

translated by 谷歌翻译

SLOPT: Bandit Optimization Framework for Mutation-Based Fuzzing

Yuki Koike , Hiroyuki Katsura , Hiromu Yakura , Yuma Kurogome

分类：机器学习

2022-11-07

Mutation-based fuzzing has become one of the most common vulnerability discovery solutions over the last decade. Fuzzing can be optimized when targeting specific programs, and given that, some studies have employed online optimization methods to do it automatically, i.e., tuning fuzzers for any given program in a program-agnostic manner. However, previous studies have neither fully explored mutation schemes suitable for online optimization methods, nor online optimization methods suitable for mutation schemes. In this study, we propose an optimization framework called SLOPT that encompasses both a bandit-friendly mutation scheme and mutation-scheme-friendly bandit algorithms. The advantage of SLOPT is that it can generally be incorporated into existing fuzzers, such as AFL and Honggfuzz. As a proof of concept, we implemented SLOPT-AFL++ by integrating SLOPT into AFL++ and showed that the program-agnostic optimization delivered by SLOPT enabled SLOPT-AFL++ to achieve higher code coverage than AFL++ in all of ten real-world FuzzBench programs. Moreover, we ran SLOPT-AFL++ against several real-world programs from OSS-Fuzz and successfully identified three previously unknown vulnerabilities, even though these programs have been fuzzed by AFL++ for a considerable number of CPU days on OSS-Fuzz.

translated by 谷歌翻译

Non-readily identifiable data collaboration analysis for multiple datasets including personal information

Akira Imakura , Tetsuya Sakurai , Yukihiko Okada , Tomoya Fujii , Teppei Sakamoto , Hiroyuki Abe

分类：机器学习

2022-08-31

多源数据融合，共同分析了多个数据源以获得改进的信息，引起了广泛的研究关注。对于多个医疗机构的数据集，数据机密性和跨机构沟通至关重要。在这种情况下，数据协作（DC）分析通过共享维数减少的中间表示，而无需迭代跨机构通信可能是合适的。在分析包括个人信息在内的数据时，共享数据的可识别性至关重要。在这项研究中，研究了DC分析的可识别性。结果表明，共享的中间表示很容易识别为原始数据以进行监督学习。然后，这项研究提出了一个非可读性可识别的直流分析，仅共享多个医疗数据集（包括个人信息）的非可读数据。所提出的方法基于随机样本排列，可解释的直流分析的概念以及无法重建的功能的使用来解决可识别性问题。在医学数据集的数值实验中，提出的方法表现出非可读性可识别性，同时保持了常规DC分析的高识别性能。对于医院的数据集，提出的方法在仅使用本地数据集的本地分析的识别性能方面表现出了9个百分点的改善。

translated by 谷歌翻译

HTML版本

NRBdMF: A recommendation algorithm for predicting drug effects considering directionality

Iori Azuma , Tadahaya Mizuno , Hiroyuki Kusuhara

分类：机器学习

2022-08-05

根据有关批准药物的信息预测药物的新作用可以被视为推荐系统。矩阵分解是最常用的推荐系统之一，为其设计了各种算法。用于预测药物效应的现有算法的文献调查和摘要表明，大多数此类方法，包括邻里正规逻辑矩阵分解，这是基准测试中最佳性能的最佳性能，它使用了仅考虑存在或不存在相互作用的二进制矩阵。但是，已知药物作用具有两个相反的方面，例如副作用和治疗作用。在本研究中，我们建议使用邻域正规化双向基质分解（NRBDMF）通过纳入双向性来预测药物作用，这是药物效应的特征。我们使用这种建议的方法使用矩阵来预测副作用，该基质考虑了药物效应的双向，其中已知的副作用被分配为阳性标签（加1），并为已知的治疗效应分配了阴性（负1）标签。使用药物双向信息的NRBDMF模型在预测列表的底部达到了副作用的富集和指示。第一次尝试使用NRBDMF来考虑药物效应的双向性质的尝试表明，它降低了假阳性并产生了高度可解释的输出。

translated by 谷歌翻译

Online Lewis Weight Sampling

David P. Woodruff , Taisuke Yasuda

分类：机器学习 | (统计)机器学习

2022-07-17

科恩（Cohen）和彭（Peng）的开创性工作向理论计算机科学界推出了刘易斯（Lewis）的重量抽样，从而产生了快速采样算法的近似值$ d $二维子空间$ \ ell_p $ to $ \ ell_p $ to $ \ ell_p $ to $（1+ \ epsilon）$错误。几项工作将这一重要原始性扩展到其他设置，包括在线核心，滑动窗口和对抗流型模型。但是，这些结果仅适用于\ {1,2 \} $中的$ p \，$ p = 1 $的结果需要一个次优$ \ tilde o（d^2/\ epsilon^2）$样本。在这项工作中，我们设计了第一个几乎最佳的$ \ ell_p $ subspace嵌入在（0，\ infty）$中的所有$ p \ in Online Coreset，滑动窗口和对抗流型模型中的第一个$ p \。在所有三个模型中，我们的算法存储$ \ tilde o（d^{1 \ lor（p/2）}/\ epsilon^2）$行。这回答了[bdmmuwz2020]的主要开放问题的实质性概括，并给出了所有$ p \ notin \ {1,2 \} $的第一个结果。为了我们的结果，我们首先分析了“一击”采样行对其刘易斯重量的采样行采样，带有样品复杂性$ \ tilde o（d^{p/2}/\ epsilon^2）$对于$ p> 2 $。以前，该方案仅具有样品复杂性$ \ tilde o（d^{p/2}/\ epsilon^5）$，而$ \ tilde o（d^{p/2） }/\ epsilon^2）$是否使用了更复杂的递归抽样。递归抽样不能在线实施，因此需要对一击刘易斯重量采样进行分析。我们的分析使用与在线数字线性代数的新颖连接。 [MSSW2018]引入的复杂性参数$ \ mu $，我们显示第一个下限表明对$ \ mu $的线性依赖性是必要的。

translated by 谷歌翻译

Wasserstein Graph Distance based on $L_1$-Approximated Tree Edit Distance between Weisfeiler-Lehman Subtrees

Zhongxi Fang , Jianming Huang , Xun Su , Hiroyuki Kasai

分类：机器学习 | 人工智能

2022-07-09

Weisfeiler-Lehman（WL）测试已广泛应用于图内核，指标和神经网络。但是，它仅考虑图的一致性，从而导致结构信息的描述能力较弱。因此，它限制了应用方法的性能提高。另外，WL检验定义的图之间的相似性和距离是粗略的测量。据我们所知，本文首次阐明了这些事实，并定义了我们称为Wasserstein WL子树（WWLS）距离的指标。我们将WL子树引入节点附近的结构信息，并将其分配给每个节点。然后，我们定义一个基于$ l_1 $ - 应用的树编辑距离（$ l_1 $ - ted）的新图嵌入空间：$ l_1 $ norm of noce noce node node nord noce node fartial farture varter vectors in space上的差异为$ l_1 $ - 节点。我们进一步提出了一种用于图嵌入的快速算法。最后，我们使用Wasserstein距离来反映$ L_1 $的图形级别。 WWL可以捕获传统指标困难的结构的小变化。我们在几个图形分类和度量验证实验中演示了其性能。

translated by 谷歌翻译